We study the optimal memorization capacity of modern Hopfield models and Kernelized Hopfield Models (KHMs), a transformer-compatible class of Dense Associative Memories.
Given ,J ( )andD ( )arethesample (i.e., atrajectory) . Note J ( ) and D ( ) are randomness J ( )andD ( )todenote anda ClearlyweJ( )= E J ( ) andD( )= E D ( ) .